arXiv:1505.01728v2 [cs.CV] 11 Aug 2015 


Integrating K-means with Quadratic Programming Feature Selection 

Yamuna Prasad^’* *, K. K. Biswas^’^ 

‘'Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India 


Abstract 

Several data mining problems are eharaeterized by data in high dimensions. One of the popular ways to reduee 
the dimensionality of the data is to perform feature selection, i.e, select a subset of relevant and non-redundant 
features. Recently, Quadratic Programming Feature Selection (QPFS) has been proposed which formulates 
the feature selection problem as a quadratic program. It has been shown to outperform many of the existing 
feature selection methods for a variety of applications. Though, better than many existing approaches, the 
running time complexity of QPFS is cubic in the number of features, which can be quite computationally 
expensive even for moderately sized datasets. 

In this paper we propose a novel method for feature selection by integrating k-means clustering with 
QPFS. The basic variant of our approach runs k-means to bring down the number of features which need to 
be passed on to QPFS. We then enhance this idea, wherein we gradually refine the feature space from a very 
coarse clustering to a fine-grained one, by interleaving steps of QPFS with k-means clustering. Every step 
of QPFS helps in identifying the clusters of irrelevant features (which can then be thrown away), whereas 
every step of k-means further refines the clusters which are potentially relevant. We show that our iterative 
refinement of clusters is guaranteed to converge. We provide bounds on the number of distance computations 
involved in the k-means algorithm. Further, each QPFS run is now cubic in number of clusters, which can 
be much smaller than actual number of features. Experiments on eight publicly available datasets show that 
our approach gives significant computational gains (both in time and memory), over standard QPFS as well 
as other state of the art feature selection methods, even while improving the overall accuracy. 

Keywords: Feature Selection, Support Vector Machine (SVM), Quadratic Programming Feature Selection 
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1. Introduction 

Many data mining tasks are characterized by data 
in high dimensions. Directly dealing with such data 
leads to several problems including high computa¬ 
tional costs and overfitting. Dimensionality reduc¬ 
tion is used to deal with these problems by bring¬ 
ing down the data to a lower dimensional space. For 
many scientific applications, each of the dimensions 
(features) have an inherent meaning and one needs 
to keep the original features (or a representative sub¬ 
set) around to perform any meaningful analysis on 
the data [[II]. Hence, some of the standard dimen¬ 
sionality reduction techniques such as PCA which 
transform the original feature space can not be di- 
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rectly applied. Dimensionality reduction in such sce¬ 
narios reduces to the problem of feature selection. 
The goal is to select a subset of features which are 
relevant and non-redundant. Searching for such an 
optimal subset is computationally intractable (search 
space is exponential) i,i. Amongst the current 
feature selection techniques, filter based methods are 
more popular because of the possibility of use with 
alternate classifiers and their reduced computational 
complexity (like Maximal relevance (MaxRel), Max¬ 
imal Dependency (MaxDep), minimal-Redundancy- 
Maximal-Relevance (mRMR) etc.) fllH]) 10]. 

Recently, a new filter based quadratic program¬ 
ming feature selection (QPFS) method [0] has been 
proposed which has been shown to outperform many 
other existing feature selection methods. In this ap¬ 
proach, a similarity matrix representing the redun¬ 
dancy among the features and a feature relevance 
vector are computed. These together are fed into a 
quadratic program to get a ranking on the features. 


ity in contrast to standard cubic time complexity for 
SVM training. 

We further enhance our feature selection approach 
by realizing that instead of simply doing one pass of 
k-means followed by QPFS, we can run them repeat¬ 
edly to get better feature clusters. Specifically, we 
propose a novel method for feature selection by inter¬ 
leaving steps of QPFS with MacQueen’s [[sl] k-means 
clustering (called Interleaved K-means QPFS). We 
gradually refine the feature space from a very coarse 
clustering to a fine-grained one. While every step of 
QPFS helps in identifying the clusters of irrelevant 
features (i.e. having 0 weights for the representative 
features), every step of k-means refines the poten¬ 
tially relevant clusters. Clusters of irrelevant features 
are thrown away after every QPFS step reducing the 
time requirements. This process is repeated recur¬ 
sively for a fixed number of levels or until each clus¬ 
ter has sufficiently small radius. Each QPFS run is 
now cubic in the number of clusters (which are much 


The computation of the similarity matrix requires quadratemaller than actual number of features and may be 


time and space in the number of features. Ranking 
requires cubic time in the number of features. This 
cubic time complexity can be prohibitively expen¬ 
sive for carrying out feature selection task in many 
datasets of practical interest. To deal with this prob- 


assumed to be constant). We show that our algorithm 
is guaranteed to converge. Further, we can bound the 
number of distance computations employed during 
the k-means algorithm. 

We perform extensive evaluation of our proposed 


lem, Lujan et al. |@|] combine Nystrom sampling method, approach on eight publicly available benchmark datasets. 


which reduces the space and time requirement at the 
cost of accuracy. 

In this paper, we propose a feature selection ap¬ 
proach by first clustering the set of features using 
two-level k-means clustering 13] and then applying 
QPFS over the cluster representatives (called Two- 
level K-Means QPFS). The key intuition is to iden¬ 
tify the redundant sets of features using k-means and 
use a single representative from each cluster for the 
ensuing QPFS run. This makes the feature selection 
task much more scalable since k-means has linear 
time complexity in the number of points to be clus¬ 
tered. The QPFS run is now cubic only in number 
of clusters, which typically is much smaller than ac¬ 
tual number of features. Our approach is motivated 
by the work of Chitta and Murty |j3], which proposes 
a two-level k-means algorithm for clustering the set 
of similar data points and uses it for improving clas¬ 
sification accuracy in SVMs. Chitta and Murty [Q] 
show that their approach yields linear time complex- 


We compare the performance with standard QPFS as 
well as other state of the art feature selection meth¬ 
ods. Our experiments show that our approach gives 
significant computational gains (both in time and mem¬ 
ory), even while improving the overall accuracy. 

In addition to Chitta and Murty [|3], there is other 
prior literature which uses clustering to reduce the 
dimensionality of the data for classification and re¬ 
lated tasks. Examples include Clustering based SVM 
(CB-SVM) [Q], clustering based trees for k-nearest 
neighbor classification lIlOll and use of PCA for effi¬ 
cient Gaussian kernel summation iQ. d presents 
a framework for categorizing existing feature selec¬ 
tion algorithms and chosing the right algorithm for 
an application based on data characteristics. To the 
best of our knowledge, ours is the first work which 
integrates the use of clustering with existing feature 
selection methods to boost up their performance. Un¬ 
like most previous approaches, which use clustering 
as a one pass algorithm, our work interleaves steps 
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of clustering with feature seleetion, thereby, reaping 
the advantage of clustering at various levels of gran¬ 
ularity. The key eontributions of our work can be 
summarized as follows: 

• A novel way to integrate the use of eluster- 
ing (k-means) with existing feature selection 
methods (QPFS) 

• Bounds on the performanee of the proposed al¬ 
gorithm 

• An extensive evaluation on eight different pub- 
liely available datasets 

The rest of the paper is organized as follows: We 
describe the baekground for QPFS approach and the 
two level k-means algorithm in Seetion [2l Our pro¬ 
posed Two-level k-means QPFS and Interleaved K- 
Means QPFS approaches are presented in Sections [3] 
and® respectively. Experimental results are described 
in Seetion [5j We conclude our work in Section 0 

2. Background 

2.1. QPFsBi 

Given a dataset with M features {fi,i = 1, ..., M) 
and N training instanees (Xj, i = 1,..., A^) with Class 
Y labels (?/*, i = 1, ...,Y) the standard QPFS formu¬ 
lation [0] is: 

/ (a) = min -a^Qa — s^a 

a 2 

Subject to a, ^ 0, i = 1, ..., M; I^a = l. 

( 1 ) 

where, a is an M dimensional veetor, / is the vee- 
tor of all ones and Q is an M x M symmetrie pos¬ 
itive semi-definite matrix, whieh represents the re¬ 
dundancy among the features; s is an M size vector 
representing relevance score of features with respec¬ 
tive elass labels. In this formulation, the quadratie 
term captures the dependenee between eaeh pair of 
features, and the linear term captures the relevance 
between eaeh of the features and the class labels. 
The task of feature selection involves optimizing the 
twin goal of seleeting features with high relevanee 
and low redundaney. Considering the relative impor¬ 
tance of non-redundancy amongst the features and 


their relevanee,a sealar quantity 9 G [0,1] is intro¬ 
duced in the above formulation resulting in |@] : 

/ (a) = min - (1 — 6 ^) a^Qa — 9s^a 

a 2 

Subjeetto a, ^ 0, f = 1, ..., M; I^a = l. 

( 2 ) 

In the above equation, 6 = 1 eorresponds to the 
formulation where only relevanee is eonsidered. In 
this case the QPFS formulation becomes equivalent 
to Maximum relevanee eriterion. When 6 is set to 
zero, the formulation considers only non-redundancy 
among the features, that is, features with low redun- 
daney with the rest of the features are likely to be 
seleeted. A reasonable value of 6 can be eomputed 
using 

9 = q/{q + m) (2a) 

where, q is the mean value of the elements of matrix 
Q and fh is the mean value of the elements of vee- 
tor s. As 0 is a sealar, the similarity matrix Q and 
the feature relevance veetor s in Q can be scaled ac- 
eording to the value of 9, resulting in the equivalent 
QPFS formulation of Equation ([T])- The QPFS ean 
be solved by using any of the standard quadratic pro¬ 
gramming implementations but it raises space and 
computational time issues. Time complexity of QPFS 
approach is 0{M^ -f NM'^) and space complexity is 
0{M‘^). To handle large seale data, Lujan et al. 
proposes to eombine QPFS with Nystrom method by 
working on subsamples of the data set for faster eon- 
vergenee. This often eomes at the eost of trade-off 
with accuracy. The details are available in [@]. 


2.2. Similarity Measure 

Various measures have been employed to repre¬ 


sent similarities among features [ll3Ll3|,ll4j,l20. Among 
these, correlation and mutual information (MI) based 
similarity measures are more popular. The classifiea- 
tion accuracy can be improved with MI as it captures 
nonlinear dependencies between pair of variables un¬ 
like eorrelation eoeffieient which only measures lin¬ 
ear relationship between a pair of variables [15, @]. 
The mutual information between a pair of features /* 
and fj can be computed as follows: 


M/(/„ /,) = H{f,) + H{h) - H{f,, h) (3) 
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where H{fi) reperents entropy of feature veetor fi 
and H{fi,fj) represents the joint entropy between 
feature veetors fi and fj Q. Following variant of 
mutual information ean be used as distanee metric [Il6n 


d{fi, fj) = 1 




max{H{fi),H{fj)) 
2.3. MacQueen’s K-Means Algorithm Si 


(4) 


13] shows that the above two-level k-means al¬ 
gorithm reduces the number of distance calculations 
as required by the MacQueen’s k-means algorithm, 
while guaranteeing a bound on the clustering error 
(details below). The difference between the number 
of distance computations by MacQueen’s k-means 
algorithm and the two-level k-means algorithm fol¬ 
lows the inequality: 


This is a k-means clustering algorithm which runs 
in two passes. In the first pass, it chooses first k sam¬ 
ples as the initial k centers and assigns each of the 
remaining N — k samples to the cluster whose cen¬ 
ter is nearest and updates the centers. In the second 
pass, each of the N samples is assigned to the clus¬ 
ters whose center is closest and centers are updated. 
The number of distance computations in the first and 
second passes are k{N — k) and Nk respectively. 
Thus, the number of distance computations needed 
in MacQueen’s k-means algorithm is 2Nk — k"^. This 
in effect means that the complexity is 0{Nk) [|3]- 

2.4. Two-level K-means^ Algorithm 


U - 


Na^R 

T 


^ NDi - ND2 ^U + 


k’a^^R 

2^ 


(5) 


where, NDi and ND 2 are the distance computations 
in MacQueen’s k-means algorithm and two-level k- 
means algorithm respectively,Q; ^ 0 is some con¬ 
stant, R is the radius of the ball enclosing all the data 
points and U is {k — k'){2N — k — k'). \f k' k, 
then the expected number of distance computations 
in level 2 is upper bounded by Na^R/r. The pa¬ 
rameter r obeys the following inequality: 


Na^R 

ir~ 


^ T ^ R 


Recently, a two-level k-means algorithm has been 
developed using MacQueen’s k-means algorithm [Q]. 
This clustering algorithm ensures that radii of the 
clusters produced is less than a pre-defined threshold 
r. The algorithm is outlined below: 


An appropriate choice of r is obtained using the in¬ 
equality 

R R 


Algorithm Two-level K-means{D, k, r) 

Input: Data Set D, Initial Number of Clusters k and 
Radius Threshold r. 

Output: Set of clusters C (ci, C 2 ,..., c*,...) (with 
radius rj ^ r) and the set of cluster centers fi. 

1. (level 1: ) Cluster the given set of data points 
into an arbitarily chosen k' clusters using Mac¬ 
Queen’s k-means algorithm. 

2. Calculate the radius r* of i*^ cluster using r* = 
maxj-^.gci d{xj, c*), where, d{.,.) is the similar¬ 
ity metric. 

3. (level 2: ) If the radius r* of the cluster c* is 
greater than the user defined threshold r, split it 
using MacQueen’s k-means with the number of 
clusters set to (^)^, where M is the dimension 
of the data. 

4. return the set of clusters (C) and corresponding 
centers (fi) obtained after level 2. 


The clustering error in two-level k-means algorithm 
is upper bounded by twice the error of optimal clus¬ 
tering [|3]. The time complexity of two-level k-means 
algorithm is 0(Nk) and the space complexity is 0(A^-|- 
k) The detailed analysis of these bounds can be 
found in [13] . 

3. Two-level K-means QPFS 

Authors in [|3] employ two-level k-means cluster¬ 
ing for reducing the number of data points for classi¬ 
fication using SVM. We use similar idea except that 
we cluster a set of features instead of the set of data 
points. We then apply QPFS on representative set 
of features. Thus, the problem is transformed into 
the feature space in contrast with their formultaion in 
the space of data points. Another key distinction is 
that we need to work with actual features unlike clus¬ 
ter means as in the case of [13] ■ This is because the 
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means of feature elusters are abstraet points and may 
not eorrespond to an aetual features over whieh fea¬ 
ture selection could be carried out. Towards this end, 
we develop two algorithms, the first one by mod¬ 
ifying the MacQueen’s k-means algorithm and the 
other one by modifying the two-level k-means algo¬ 
rithm [|3] to return cluster representatives (features) 
in place of cluster means. Each feature is represented 
as an iV-dimensional vector where N denotes the 
number of training instances (see Section [ZTl) . The 

component of this vector denotes the value of the 
feature in the data point. The distance metric be¬ 
tween a pair features is defined using mutual infor¬ 
mation as in Equation dH). 

In the following sections, M is cardinality of the 
(feature) space to be clustered. This takes the place 
of N which is the cardinality of (data) space in the 
case of Chitta and Murty |[3]. Similarly, N denotes 
the dimensionality of the (feature) space to be clus¬ 
tered. This takes place of n which is the dimensional¬ 
ity of the (data) space in case of Chitta and Murty [|3]. 

3.1. Variant MacQueen’s K-means 

We propose a variant of MacQueen’s K-means 
algorithm for clustering the features to produce set 
of clusters with redundant features instead of cluster¬ 
ing datapoints. In each iteration of the MacQueen’s 
K-means algorithm, the nearest point from the up¬ 
dated mean is selected as the new center (called the 
cluster representative). Each iteration needs to com¬ 
pute distance from M — k features to k centers and 
distance from center to nearest feature in its cluster. 
Thus, each iteration needs k{M — k) + M distance 
computations. As MacQueen’s k-means uses two it¬ 
erations, the total number of distance compuations 
would be 2Mk — 2k‘^ -f 2M. The complexity is thus 
0{Mk). 

3.2. Variant Two-level K-Means(TLKM) 

We propose two-level k-Means algorithm (TEKM) 
by replacing MacQueen’s k-means algorithm with its 
variant in the two-level k-Means algorithm as given 
in Section[2j4l It is important to note that we are clus¬ 
tering features rather than the data points. TEKM re¬ 
turns the feature clusters along with corresponding 
representatives. Eollowing the arguments in [Bl, we 


can derive the bounds on number of distance compu¬ 
tations for our proposed TEKM algorithm in a sim¬ 
ilar manner. The only difference is that we have an 
additional 2M — k"^ term as explained in Section 
The bounds for difference in the number of distance 
computations between variant MacQueen’s k-means 
and TEKM is 


U-M{ 


{a^ + 1)R ^ 


^ ND1-ND2 ^ U 3 


k’a'^^R 

T 

(7) 


Here, (7 is 2(fc — k'){M — k — k') and other parame¬ 
ters have same definitions as in Equation ([5]) of Sec¬ 
tion |2]4l Eurther, if k' <C k, then the expected num¬ 
ber of distance computations in the second level is 
upper bounded by M(2 -f (a^ -\-1)R/t) and param¬ 
eter r obeys the following inequality 


M( fj ) ^ ^ «-R 

Following 0), for reducing the number of computa- 
tions in TEKM algoritm, it is necessary that 


max 


R 


R 


(fc)!/^’ (M-fc)V^ 


) ^t^R ( 8 ) 


OR, 


Eollowing the arguments in [|3], it can be shown that 
the time and space complexities of the modified two- 
level k-means for clustering features are 0{Mk) and 
0{M k), respectively. 


3.3. Two-level K-Means QPFS (TLKM-QPFS) Algo¬ 
rithm 

We are now ready to present the QPES based fea¬ 
ture selection method using TEKM. We named this 
algorithm TEKM-QPES, henceforth. We employ TEKM 
approach to cluster the features in a given dataset fol¬ 


lowed by a run of QPES. Algorithm |rLKM-QPF5| il- 
lustrates our proposed Two-level k-means QPES (TEKM- 
QPES) approach. 


Algorithm TLKM-QPFS{FS, k, r) 

Input: Eeature Set FS, Initial Number of Clusters k 
and Radius Threshold r. 
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Output: Final representative feature set F (features) 
in order of their a values. 

1. Find the representatives F using TLKM algo¬ 
rithm as defined in Section |3^ 

2. Apply QPFS on the cluster representatives F. 

3. return Ranked F in the order of a 

Time and space complexities for TLKM approach in 
Step [Dare 0{Mk) and 0{M + k) respectively. In 
step |2] of Algorithm \TLKM-QPFS\ QPFS approach 
is used to rank the k cluster representatives (features) 
obtained in step [B Time and space complexities for 
this step are 0{k^ + Nk"^) and 0{k‘^), respectively. 
Therefore, the total time and space complexities of 
the algorithm \TLKM-QPFS\ are 0{Mk) + 0{k^ + 
Nk'^) ~ 0{M) and 0{M + k) + 0{k‘^) ~ 0{M), 
respectively. It is clear from this analysis that both 
the time and space complexities of this algorithm are 
0{M) as < M. 


4. Interleaved K-Means QPFS (IKM-QPFS) 


We now propose a new algorithm by combining 
the benefits of clustering approach with QPFS. In 
this proposed algorithm, we strive to refine relevant 
feature space from coarse to fine-grained clusters to 
improve accuracy while still preserving some of the 
computational gains obtained by TLKM-QPFS. Al¬ 


gorithm uses k-means to identify clus¬ 

ter of features which are similar to each other (redun¬ 
dant). A representative is chosen for each of the clus¬ 
ters and then fed into QPFS. QPFS in turn returns a 
ranking on these cluster representatives. Many of the 
representatives are deemed irrelevant for classifica¬ 
tion (a = 0). Amongst the sets of clusters whose rep¬ 
resentatives were deemed irrelevant, consider those 
with cluster radius r < r. All the features in the 
these clusters can be considered irrelevant (since the 
cluster representative was irrelevant and cluster ra¬ 
dius is sufficiently small) and can be thrown away. 
This also gives us an opportunity to further refine the 
larger clusters(r > r) potentially improving accu¬ 
racy by identifying a larger subset of relevant fea¬ 
tures. This process of executing QPFS after initial 
run of k-means clustering can be repeated recursively. 
Each run of k-means further refines the relevant sub¬ 
clusters whereas each run of QPFS helps in identi¬ 
fying relevant set of features. This leads to the fol¬ 


lowing algorithm for feature selection which we have 
named Interleaved K-Means QPFS (IKM-QPFS). 


4.1. Interleaved K-Means QPFS (IKM-QPFS) Algo¬ 
rithm 

To start with, we first employ k-means to find the 
a set of cluster representatives. These cluster repre¬ 
sentatives are then fed into QPFS to get feature rank¬ 
ing on them. The cluster with sufficiently small ra¬ 
dius (r < r) need not be refined further and can be 
directly use for final level of feature selection. Here, 
we throw away those representatives whose a values 
are zero(irrelevant for classification). At the same 
time, clusters with radius greater than r need to be 
refined further. This can be done recursively using 
above steps. In practice, we need to run the recursive 
splitting of clusters only upto a user defined level. 
In our approach, we split each cluster into a fixed 
number (k) of sub-clusters during k-means splitting. 
The proposed Interleaved K-Means-QPFS algorithm 


is presented in Algorithm IKM-QPFS 


Algorithm IKM-QPFS{FS, k, L, r) 

Input: Feature Set FS, Number of Sub-Clusters k 
that each Cluster is split into, Radius Threshold 
r and Number of Interleaved Levels L. 

Output: Ordered set of relevant features F 

1. Apply variant MacQueen’s algorithm to features 
in FS\ Obtain clusters C, cluster representa¬ 
tives /. 

2. Apply QPFS on the cluster representatives / and 
obtain a. 

3. / ^ 1 

4. F ^ IRR{C, /, a, fc, r, I, L) 

5. Apply QPFS on F and rank F according to a. 

6. return F 


The sub procedure IRR(Identify Relevant and Re¬ 
fine) is illustrated in Algorithm l/RRl 

Algorithm IRR{C, f,a,k,T,l,L) 

Input: Cluster Set C, Cluster representatives /, a 
obtained by QPFS, Number of Clusters k, Ra¬ 
dius Threshold r. Number of level I, and Maxi¬ 
mum Number of Levels L 

Output: Final centers F (features) in order of their 
a values. 

1 . for each cluster Ci & C 
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2 . 

3. 


do 


4. 

5. 

6 . 

7. 

8 . 


9. 


find the radius = max/.g^; /*); 
(i(.,.) is the distanee metrie. 
if (r* < r or / = L) 
then if (a* > 0) 

then F = VJ{fi} 

else 

Apply variant MacQueen’s k-means 
algorithm to features in eluster eg 
Obtain elusters C, eluster repre¬ 
sentatives /' 

Apply QPFS on the eluster cen¬ 
ters C and get a'. 

10 . + 1 

11. F' ^ IRR{C'J',a',k,T,l) 

12. F^FU{F'} 

13. return F 

In the above algorithm, condition in step 0] checks 
if the boundary condition has been reached and no 
more splitting needs to be done (i.e. maximum num¬ 
ber of levels L has been reached or r, < r). In which 
case, if the cluster is relevant (oj > 0), then cor¬ 
responding features are added to the feature set to 
be returned (step[^. Else, they are discarded. Else 
condition in step |7] goes on to recursively refine the 
clusters when boundary condition is not yet reached. 

The recursive approach for a sub-cluster at i*^ 
level can be visualized as follows. In Figure [B sub¬ 



clusters 1 and k have radii greater than r. They are 
split further independent of a values. Their contribu¬ 
tion to the final feature set is calculated by refining 
them recursively. Sub-clusters 2, 3 and k — 1 have 


radii less than r. They don’t need to be split further. 
Amongst these, representatives for 2 and k — 1 con¬ 
tribute to the final set of features. Sub-cluster 3 is 
discarded since 03 = 0 . 

4.2. Convergence 

In every recursive call of Algorithm \IRR[ all the 
clusters whose radius is greater than r are further 
split into k sub-clusters. Since every split is guaran¬ 
teed to decrease the size of the original cluster, and 
we have a finite number of features, the algorithm 
is guaranteed to terminate and find clusters each of 
whose radius is less than r, given sufficiently large 
L. Note that in the extreme case, a cluster will have 
only one point in it and hence, its radius will be zero. 
Now, let us try to analyze what happens in an aver¬ 
age case i.e. when the sub-cluster split induced by 
the MacQueen’s algorithm results in uniform-sized 
clusters. More formally, let denote the radius of 
the cluster i (at some level) which needs to be split 
further. Then, the volume enclosed by this cluster 
is (7 * . Here, N is the number of original data 

points (this is the space in which features are embed¬ 
ded). By the assumption of uniform size, this vol¬ 
ume is divided equally amongst all the sub-clusters. 
Hence, the volume of each sub-cluster is going to be 
C * Ti^/k. This volume corresponds to a sub-cluster 
of radius Vi/k^^^. Hence, at every level, the cluster 
radius is reduced by a factor of k^/^. If the start¬ 
ing radius is R, then, after I levels the radius of a 
sub-cluster is given by R/k^^^. We would like this 
quantity to be less than equal to r. This results in the 
following bound on 1. 


R 

—r < 

kN 


j_ R 

kN > — 
T 


F > 


I >N* logfc 


(taking log) 


Hence, under the assumption of uniform splitting, 
continuation up to * log^ (i?/r) levels will guaran¬ 
tee that each sub-cluster has radius < r. If ~ M, 
then features are very sparsely distributed in the data 
space, and above is a very loose bound. On the other 
hand if A^ -C M (as is the case with many Microar¬ 
ray datasets), then, above bound can be put to practi¬ 
cal use. 
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MacQueen’s algorithm starts with the first set of 
k points as the eluster representatives, followed by 
another pass of assigning the points to eaeh eluster 
and then recalculating the cluster representatives. In 
general, the assumption of uniform sub-cluster may 
only be an approximation to the actual clusters which 
are obtained, and hence, above bound will also be an 
approximation. A detailed analysis of whether one 
can bound this approximation is proposed to be car¬ 
ried out in future. 

4.3. Distance Computations 

Distance computations done by interleaved steps 
of k-means in Algorithm |4] can be bounded as fol¬ 
lows. In worst case, none of the clusters will be 
discarded and also, their radii will be greater than 
or equal to threshold (r) at each level. This will 
lead to recursive splitting of each cluster upto level 
L. Now, consider cluster Cj at level i. The num¬ 
ber of distance computations required by the Mac¬ 
Queen’s algorithm to split this cluster further is given 
by 2\cj\k — 2/c^ -f 2|cj| (see Section [3TI) . Thus, total 
number of distance computations at level i is given 
as 'Lj2\cj\k — 2k‘^ + 2\cj\ = 2Mk — k"^ + 2M. The 
equality follows from the fact that total number of 
points in clusters at any level is Sj|cj| = M (since 
each cluster is split upto the last level). Therefore, 
the number of distance computations in worst case 
is independent of the particular level. Hence, the to¬ 
tal number of distance computations for Algorithm |4] 
can be bounded by L{2Mk — k"^ + 2M). 

4.4. Time Complexity Analysis 

Time required in steplUand step[2]of Algorithm |/AM 
is 0{Mk) and 0{k^) respectively. In steplHof Algo¬ 
rithm \IKM-QPFS\ Algorithm \IRR\ is called which is 
executed recursively. The time required for its exe¬ 
cution can be computed as follows: 

If the maximum number of levels is L, number of 
cluster is k then it can be easily shown that the time 
complexity of Algorithm \IRR\ is upper bounded by 
0{LMk + The first term comes from the 

number distance computations in k-means, and the 
second term comes from the 0{k^) calls to QPFS 
(/c*“^ calls at level i), where each call takes 0{k^) 
time. 

Thus, time required in steplHof A\%on\hm\IKM-QPFS 


is 0{LMk + k^^^) and time required in step|5]of Al¬ 
gorithm is 0{k^^^). 

As L and k are very small constants, total time re¬ 
quired by Algorithm |/gM-QPF5| is 0{Mk)+0{k'^) + 
0{LMk+k'^^^)+0{k^^'^) which in effect is 0{M). 

4.5. Interleaved K-Means Aggressive QPFS (IKMA- 
QPFS) 

In this section, we present a variation on the IKM- 
QPFS algorithm described above. The key idea is 
that after every step of QPFS run, we throw away 
all the clusters whose representatives are deemed ir¬ 
relevant during a QPFS run (i.e., a = 0), indepe- 
dent of the radii of the corresponding clusters. This 
is a deviation from the original proposed algorithm, 
wherein, we throw away a cluster only if the corre¬ 
sponding a = 0 and the cluster radius r < r. We call 
this variation Interleaved K-Means Aggressive QPFS 
(IKMA-QPFS) since it is aggressive about discard¬ 
ing the clusters whose representatives are deemed ir¬ 
relevant. This potentially leads to even larger gain 
in terms of computational complexity since IKMA- 
QPFS tries to identify the irrelvant feature clusters 
early enough in the process and throws them away. 
But since some of these clusters can be large in size 
(r > r), we might trade-off the additional computa¬ 
tional gain by a loss in accuracy. But interestingly, 
in our analysis, we found that almost always this ag¬ 
gressive throwing away of clusters happened only to¬ 
wards the deeper levels of clustering(i.e., very few 
representatives were deemed irrelevant in the begin¬ 
ning levels of clustering), where the clusters were al¬ 
ready sufficiently small. Hence, as we will see in our 
-gjjjBfij ments, not only this variant performs better in 
terms of computational efficiency than IKM-QPFS, 
it even simplifies the feature selection problem, giv¬ 
ing improved accuracy in some cases. 

For IKMA-QPFS, the only change in the Algo¬ 
rithm is before step 3 (i.e., right after the for 
loop starts), where we need to put another check i f (a 
0). If this condition is satisfied, we simply return out 
of the function. Rest of the algorithm remains the 
same. The convergence, the distance computations 
and the time complexity analyses presented above 
also remain the same as for IKM-QPFS. This is be¬ 
cause all the analyses have been done in the worst 
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case when no elusters might be thrown away at inter¬ 
mediate levels. 


Table 1: Datasets: detailed description 




No. of 

No. of 

No. of 

5. Experiments 

Dataset 

Instances 

Features 

Classes 

We eompare the performance of our proposed ap- 

WDBC 

569 

30 

2 

proaehes TLKM-QPFS, IKM-QPFS and IKMA-QPFS 

Colon 

62 

2000 

2 

with QPFS, FGM and GDM on eight publicly avail- 

SRBCT 

63 

2308 

4 

able benehmark datasets. We compare all methods 

Lymphoma 

45 

4026 

2 

for their time and memory requirements and also for 

Leukemia 

72 

7129 

2 

their error rates at various numbers of top-k features 

RAC 

33 

48701 

2 

seleeted. FGM and GDM methods works for binary 

MNIST 

13966 

784 

2 

elassification datasets, therefore eomparison with FGM 

USPS 

1500 

241 

2 


and GDM is not earried out for SRBCT multi-elass 
elassification datasets. We observe an improved ae- 
curacy for FGM and GDM on normalized dataset 
in range [ -1, 1]. Therefore, we normalized all the 
datasets in range [ -1, 1]. 

We plot the accuraey graphs for varying (1 to 
100 ) the number of top-k features seleeted for all 
the datasets exeept WDBC. For WDBC dataset, we 
have reported the results up-till 30 top features as this 
dataset has only 30 features. Next we deseribe the 
details of the datasets and our experimental method¬ 
ology followed by our actual results. 

5.1. Datasets 

For our experimental study, we have used eight 
publiely available benehmark datasets used by other 
researchers for feature seleetion. The deseription of 
these datasets is presented in Table [B WDBC is 
breast eaneer Wiseonsin (diagnostic) dataset. Colon, 
SRBCT, Lymphoma, Leukemia and RAC datasets 
are mieroarray datasets ( iHIiTiaH) and the last 
two are vision datasets. 


5.2. Methodology 

WDBC and USPS datasets are divided into 60% 
and 40% sized splits for training and testing, respec¬ 
tively as in [18]. MNIST dataset is divided in 1L982 
training and 1984 testing instanees following lll9n . 
The reported results are the average over 100 ran¬ 
dom splits of the data. The number of samples is very 
small (less than 100) in microarray datasets, so leave- 
one-out eross-validation is used for these datasets. 
We use mutual information as in |@] for redundaney 
and relevanee measures in the experiments. The data 


is discretized using three segments and one standard 
deviation for computing mutual information as in |@]. 
For QPFS, the value of seale parameter {0) is eom- 
puted using eross-validation from the set of 6 val¬ 
ues {0.0, 0.1, 0.3, 0.5, 0.7, 0.9}. The error rates 
obtained were very similar to the ones obtained us¬ 
ing the scale parameter based on Equation (l2al) . For 
TLKM-QPFS, we used eross validation to determine 
the values of expected number of elusters k (to get r) 
and for IKM-QPFS and IKMA-QPFS, we used eross 
validation to determine the good values r (thresh¬ 
old parameter) and k' (initial number of clusters). 
Threshold parameter r is choosen from the set (0.70, 
.. .,0.99} with step size of 0.01. k is ehoosen from 
the set { 5,..., 1000 } with step size of 5 and k' pa¬ 
rameter in IKM-QPFS (as well in IKMA-QPFS) is 
choosen from the set [3, 150 ]. 

After feature selection is done, linear SVM (L2- 
regularized L2-loss support veetor elassifieation in 
primal) lEol] is used to train a classifier using the op¬ 
timal set of features output by QPFS, TLKM-QPFS, 
IKM-QPFS and IKMA-QPFS methods. FGM and 
GDM are embedded methods, so aeeuraey for both 
of these methods are obtained aeeording to liilii. 
The experiments were run on a Intel Core^^ i7 (3.10 
GHz) machine with 8 GB RAM. 

We have presented the variation in the error rates 
on varying the values of r with a fixed value of initial 
number of elusters k'=\5 in figure [2| and on varying 
the values of initial number of elusters k' with a fixed 
value of r=0.8 in figure [3] for Colon dataset. On 
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Figure 2: Plot of accuracies (%) for Colon dataset using IKM- 
QPFS with varying t and varying number of top fc(l-lOO) fea¬ 
tures at fixed initial clusters fc'=15 



Figure 3: Plot of accuracies (%) for Colon dataset using IKM- 
QPFS with varying number of initial clusters k' and varying 
number of top A:( 1-100) features at fixed t=0.8 

Other datasets, it shows a similar trend. 

5.3. Results 

5.3.1. Time and Memory 

Tables |2] and [3] show the time and memory re¬ 
quirements for feature seleetion done using eaeh of 
the methods for all datasets respeetively. On all the 
datasets, TLKM-QPFS,IKM-QPFS andIKMA-QPFS 
are orders of magnitude faster than QPFS. TLKM- 
QPFS is three times faster than the GDM on RAC, 
MNIST and USPS datasets while three to ten times 
slower than the GDM on WDBC, Colon, Lymphoma 
and Leukemia datasets. Further, TLKM-QPFS is an 
order of magnitude faster than the FGM on MNIST 


and USPS datasets. IKM-QPFS and IKMA-QPFS 
are three to five times faster than FGM and GDM 
on RAC, MNIST and USPS datasets while two to 
five times slower on WDBC, Colon, Lymphoma and 
Leukemia datasets. The performanee of TLKM-QPFS 
and IKM-QPFS are eomparable while IKMA-QPFS 
is two to fifteen times faster than TLKM-QPFS and 
two to six times faster than the IKM-QPFS. This aehieve- 
ment of reduetion in time of IKMA-QPFS is due 
to aggressive throwing of elusters when a beeomes 
zero. 

QPFS ran out of memory for RAC dataset in eon- 
trast to TLKM-QPFS and IKM-QPFS approaehes. 
Therefore, we use QPFS with Nystrdm method at 
Nystrom sampling rate p =0.05 for RAC dataset. The 
results are appended with * for QPFS with Nystrom 
method in all the tables. For RAC dataset, TLKM- 
QPFS and IKM-QPFS are more than two orders of 
magnitude faster than QPFS with Nystrom on RAC 
dataset. 

TLKM-QPFS, IKM-QPFS and IKMA-QPFS re¬ 
quire more than an order of magnitude less mem¬ 
ory eompared to QPFS on all the datasets, exeept 
MNIST. On MNIST, they require about as mueh mem¬ 
ory as QPFS. The memory required by FGM and 
GDM are marginally less than TLKM-QPFS, IKM- 
QPFS and IKMA-QPFS. The memory required by 
TLKM-QPFS, IKM-QPFS and IKMA-QPFS are eom¬ 
parable on all datasets. 

The results in tables [2] and [3l experimentally val¬ 
idates the theoretieal eomplexities for time and mem¬ 
ory. 

5.3.2. Accuracy 

To eompare the error rates aeross various meth¬ 
ods, we varied the number of top features to be se- 
leeted in the range from 1 to 100. For RAC dataset, 
we varied the number of top features at an interval 
of 5 in the range from 5 to 100. In tables |4] and 
[51 — eorresponding to a method represents that the 
experiment was not done with that method. From 
table in it ean be observed that our proposed IKMA- 
QPFS and IKM-QPFS methods aehieves lowest error 
rates for all datasets. In general, IKMA-QPFS and 
IKM-QPFS aehieves lowest error rates earlier than 
the QPFS for WDBC, SRBCT, Lymphoma, Leukemia 
and USPS datasets and aehieves lowest error rates 
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Table 2: Comparison of average execution times(in seconds). 


Dataset 

QPFS 

FGM 

GDM 

TLKM-QPFS 

IKM-QPFS 

IKMA-QPFS 

Colon 

104.90 

2.04 

0.46 

0.44 

1.39 

0.76 

SRBCT 

164.42 

- 

- 

11.65 

15.22 

5.91 

Lymphoma 

938.73 

0.21 

1.30 

32.68 

12.41 

2.13 

Leukemia 

4864.69 

0.46 

5.91 

17.06 

38.35 

38.20 

RAC 

9385.90* 

0.55 

126.82 

46.43 

31.03 

38.89 

MNIST 

161.69 

984.16 

268.35 

80.37 

60.51 

53.28 

USPS 

1.34 

18.50 

3.75 

0.84 

0.80 

0.81 



Table 3: 

Comparison of average memory requirements(in KB). 


Dataset 

QPFS 

FGM 

GDM 

TLKM-QPFS 

IKM-QPFS 

IKMA-QPFS 

WDBC 

544 

1524 

1068 

500 

524 

524 

Colon 

84472 

3824 

2472 

9727 

10075 

9884 

SRBCT 

100418 

- 

- 

12279 

12703 

11231 

Lymphoma 

191549 

4452 

2936 

14963 

12504 

10428 

Leukemia 

636103 

10164 

5808 

13874 

12503 

10427 

RAC 

1456437* 

27284 

17220 

25879 

21807 

19035 

MNIST 

97076 

253264 

88360 

97111 

97111 

97111 

USPS 

40138 

14540 

4628 

33435 

10394 

10394 


earlier than FGM and GDM for all datasets exeept 
Colon dataset (tables |4]and[5]). QPFS aehieves lowest 
error rates earlier than FGM and GDM for WDBC, 
Colon, Lymphoma and Leukemia datasets. TLKM- 
QPFS aehieves lowest error rates earlier than the FGM 
for WDBC, Colon, Lymphoma and USPS datasets 
and also aehieves lowest error rates earlier than the 
GDM for WDBC, Lymphoma and USPS datasets. 

Figure m plots the error rates for eaeh datasets as 
the number of top seleeted features is varied from 
1 to 100. The baseline here represents the aeeuraey 
obtained when all the features are used for elassifi- 
eation and k-means-baseline represents the aeeuraey 
obtained when all the representative features (after 
two-level k-means) are used for elassification. It is 
evident from figures [5^5.41 that error rates achieved 
by TLKM-QPFS, IKM-QPFS and IKMA-QPFS meth¬ 
ods are improved over QPFS, FGM and GDM for 
each of the datasets. In all the datasets, TLKM-QPFS 
and IKM-QPFS achieve lower error rates with a less 


number of top selected features than QPFS. Usually, 
IKM-QPFS and IKMA-QPFS achieves lower error 
rates early than the TLKM-QPFS. In figure|4]and fig¬ 
ure [51 plots of IKM-QPFS and IKMA-QPFS signifi¬ 
cantly overlaps. 

As expected, the error rates come down as rel¬ 
evant features are added to the set. Once the rel¬ 
evant set has been added, any more additional (ir¬ 
relevant) features lead to loss in accuracy. Tables 
and |7] present the average test set error rates for each 
of the methods for top ranked k features {k being 
10, 20, 30,50, 100), where top k features are chosen 
as output by the respective feature selection method. 
On all the datasets, IKMA-QPFS performs signifi¬ 
cantly better than QPFS, FGM and GDM at all the 
values of top k features selected. Further, error rates 
for TLKM-QPFS, IKM-QPFS and IKMA-QPFS are 
comparable on all datasets. This is particularly evi¬ 
dent early on i.e. for a smaller number of top-fc fea¬ 
tures. This points to the fact that IKM-QPFS and 
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Table 4; Table corresponding to lowest error rates (%) 


Dataset 

QPFS 

FGM 

GDM 

TLKM-QPFS 

IKM-QPFS 

IKMA-QPFS 

WDBC 

3.28 

3.49 

3.26 

3.5 

3.20 

3.20 

Colon 

12.9 

11.29 

16.13 

11.29 

9.68 

8.06 

SRBCT 

0.00 

- 

- 

0.00 

0.00 

0.00 

Lymphoma 

0.00 

2.22 

6.67 

0.00 

0.00 

0.00 

Leukemia 

2.78 

11.11 

16.67 

1.39 

0.00 

0.00 

RAC 

0.00* 

6.06 

0.00 

0.00 

0.00 

0.00 

MNIST 

4.13 

4.69 

4.99 

3.83 

3.48 

3.43 

USPS 

9.02 

9.86 

10.1 

9.49 

8.27 

8.27 


Table 5: Table for number of features corresponding to lowest error rates (%) 

Dataset 

QPFS 

FGM 

GDM 

TLKM-QPFS 

IKM-QPFS 

IKMA-QPFS 

WDBC 

21 

29 

25 

14 

10 

10 

Colon 

5 

44 

6 

14 

35 

24 

SRBCT 

11 

- 

- 

9 

8 

8 

Lymphoma 

17 

47 

100 

8 

7 

6 

Leukemia 

24 

37 

38 

39 

34 

34 

RAC 

10 * 

7 

4 

9 

7 

5 

MNIST 

95 

20 

75 

85 

99 

99 

USPS 

93 

54 

64 

34 

67 

67 


IKMA-QPFS are able to rank the relevant set of fea- QPFS and signifieantly low error rates when eom- 
tures right at the top. pared with FGM and GDM. Espeeially, our proposed 

TLKM-QPFS performs better than QPFS in all approaehes help reaeh the relevant set of features early 


the cases (dataset and number of top k feature com¬ 
bination), except on Colon data at A; = loH. Among 
the two proposed approaches (TLKM-QPFS, IKM- 
QPFS and IKMA-QPFS), both IKM-QPFS and IKMA- 
QPFS are clear winner in terms of the accuracy. 

5.4. Summary 

It is clearly evident from above results that our 
all the three proposed approaches for feature selec¬ 
tion, TLKM-QPFS, IKM-QPFS and IKMA-QPFS, 


on, which is a very important property of a good fea¬ 
ture selection method. The computational require¬ 
ments of TLKM-QPFS, IKM-QPFS and IKMA-QPFS 
are similar to each other. On the large microarray 
dataset our proposed approaches are faster than the 
FGM and GDM. As for performance, IKMA-QPFS 
is a clear winner among the three variants 

In tables Sand HI TLKM, IKM and IKMA rep¬ 
resent TLKM-QPFS, IKM-QPFS and IKMA-QPFS, 
respectively. 


give significant gains in computational requirements 
(both time and memory), even while improving the 
overall accuracy in all cases when compared with 


^it performs marginally worse in couple of cases (k = 10, 
k = 30) for SRBCT 


6. Conclusion 

In this paper, we proposed an approach for in¬ 
tegrating k-means based clustering with Quadratic 
Programming Feature Selection (QPFS). The key idea 
involved using k-means to cluster together redundant 
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Figure 4: Plots of Error rates for each methods with varying number of top fc( 1-100) features for bioinformatics datesets 


13 














































































---Baseline 



---Baseline 

-*-K-Means 



-*-K-Means 

—QPFS 

20 


—QPFS 

-^FGM 


— FGM 

-e-GDM 



^GDM 

— TLKM 

18 


^TLKM 

-b-IKMA 


n 1 1 

-b-IKMA 

-•-IKM 


V 1 w 1 

-•-IKM 


30 40 50 60 70 

Number of Top Features 



30 40 50 60 70 

Number of Top Features 


(a) MNIST Dataset 


(b) USPS Dataset 


Figure 5; Plots of Error rates for each methods with varying number of top fc(l-lOO) features for vision datasets 


sets of features. Only one representative from eaeh 
eluster needed to be eonsidered during the QPFS run 
for feature selection, reducing the complexity of QPFS 
from cubic in number of features to cubic in number 
of clusters (which is much smaller). We presented 
two variations of our approach. TLKM-QPFS used 
two level k-means to identify a set of representative 
features followed by a run of QPFS. In the more 
sophisticated variant, IKMA-QPFS, we interleaved 
the steps of k-means with QPFS, leading to a very 
fine grained selection of relevant features. Exten¬ 
sive evaluation on eight publicly available datasets 
showed the superior performance of our approach 
relative to existing state of the art feature selection 
methods. 

One of the key directions for future work involves 
providing a generic framework for integrating a given 
clustering algorithm with a filter based feature selec¬ 
tion method. Other direction includes extending our 
approach to sparse representations to deal with data 
in very high dimensions (millions of features, such 
as in vision). A third direction deals with coming 
up with a parallel formulation of our proposed ap¬ 
proach. 
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Table 6: Error rates (%) for bioinformatics datasets by each 
methods 


Table 7: Error rates (%) for vision datasets by each methods 


Dataset 


k (number of top features) 


10 

20 

30 50 

100 


QPFS 

6.07 

3.41 

3.49 

- 


FGM 

3.61 

3.64 

3.49 

- 

WDBC 

GDM 

4.54 

3.72 

3.36 

- 

TLKM 

4.43 

3.72 

- 

- 


IKM 

3.20 

3.43 

- 

- 


IKMA 

3.20 

3.43 

- 

- 


QPFS 

14.52 24.19 19.36 20.97 24.19 


FGM 

19.36 16.13 16.13 17.74 14.52 

Colon 

GDM 

22.58 30.65 30.65 25.81 24.19 

TLKM 

20.97 14.52 14.52 17.74 17.74 


IKM 

17.74 22.58 14.52 14.52 19.35 


IKMA 

16.13 12.90 16.13 22.58 20.97 


QPFS 

3.18 

0.00 

0.00 0.00 

0.00 


FGM 

- 

- 

- 

- 

SRBCT 

GDM 

- 

- 

- 

- 

TLKM 

1.59 

0.00 

1.59 0.00 

0.00 


IKM 

0.00 

0.00 

0.00 0.00 

0.00 


IKMA 

0.00 

0.00 

0.00 0.00 

1.59 


QPFS 

8.89 

2.22 

0.00 0.00 

0.00 


FGM 

6.67 

8.89 

8.89 8.89 

4.44 

Lymphoma 

GDM 

17.78 28.89 33.33 28.89 

6.67 

TLKM 

4.44 

0.00 

0.00 0.00 

0.00 


IKM 

8.89 

0.00 

0.00 0.00 

0.00 


IKMA 

0.00 

0.00 

0.00 0.00 

4.44 


QPFS 

13.89 

9.72 

5.56 6.94 

5.56 


FGM 

20.83 19.44 19.44 13.85 13.89 

Leukemia 

GDM 

18.06 25.00 31.94 18.06 27.78 






TLKM 

18.06 15.28 

4.17 5.56 

5.56 


IKM 

13.89 13.89 

6.94 4.17 

4.17 


IKMA 

13.89 13.89 

6.94 4.17 

4.17 


QPFS-bNys 

0.00 

0.00 

3.03 0.00 

0.00 


FGM 

9.09 

9.09 15.15 15.15 18.18 

RAC 

GDM 

3.03 

9.09 

9.09 24.24 

12.5 

TLKM 

0.00 

0.00 

0.00 0.00 

0.00 


IKM 

0.00 

0.00 

0.00 0.00 

0.00 


IKMA 

0.00 

3.03 

9.09 9.09 



Dataset 


k (number of top features) 
10 20 30 50 100 



QPFS 

15.83 

9.47 

6.25 

5.40 

4.83 


FGM 

6.20 

4.69 

8.42 

9.32 14.77 

MNIST 

GDM 

5.79 

5.79 

5.79 

5.79 

5.85 

TLKM 10.18 

6.40 

5.99 

4.89 

4.03 


IKM 

15.73 

5.50 

4.99 

4.3 

3.53 


IKMA 

15.32 

6.15 

4.33 

4.74 

3.48 


QPFS 
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